(improvement) Cython row_parser: cache ParseDesc for prepared statements#9
Open
mykaul wants to merge 11 commits into
Open
(improvement) Cython row_parser: cache ParseDesc for prepared statements#9mykaul wants to merge 11 commits into
mykaul wants to merge 11 commits into
Conversation
ee3c279 to
9f2f3ff
Compare
Introduce the data layer for Private Link client routes support: - ClientRoutesChangeType enum for CLIENT_ROUTES_CHANGE event types - ClientRouteProxy dataclass and ClientRoutesConfig for user-facing configuration - _Route frozen dataclass for immutable route records - _RouteStore for thread-safe route storage with atomic update/merge and preferred route selection that avoids unnecessary connection_id migration when multiple routes exist for the same host
Add _ClientRoutesHandler which manages the full lifecycle of dynamic address translation via system.client_routes: - initialize(): loads all routes at startup and on control connection reconnect - handle_client_routes_change(): processes CLIENT_ROUTES_CHANGE events with targeted merge or full refresh depending on event data - _query_all_routes_for_connections(): complete refresh query using connection_id IN (...) - _query_routes_for_change_event(): targeted query grouping by connection_id with host_id IN (...) per group - _execute_routes_query(): common query execution and result parsing with proxy address override support - resolve_host(): host_id to (address, port) resolution with DNS lookup
- ClientRoutesEndPointFactory: creates endpoints from system.peers rows by extracting host_id, deferring address translation and DNS resolution until connection time - ClientRoutesEndPoint: endpoint that resolves via _ClientRoutesHandler on each connection attempt, ensuring immediate reaction to route changes and CLIENT_ROUTES_CHANGE events
Cluster: - Add client_routes_config parameter with mutual exclusivity check against endpoint_factory - Create _ClientRoutesHandler and ClientRoutesEndPointFactory when client_routes_config is provided ControlConnection: - Register CLIENT_ROUTES_CHANGE event watcher when handler is present - Forward events to handler via _handle_client_routes_change - Trigger full route re-read on control connection reconnection
Cover ClientRouteEntry/ClientRoutesConfig validation, _RouteStore get/merge operations, _ClientRoutesHandler initialization, ClientRoutesEndPoint resolution with and without route mappings, and SSL check_hostname rejection with client_routes_config.
Add comprehensive integration tests covering: - TCP proxy and NLB emulator infrastructure for simulating private link connectivity - query_routes filtering with different connection/host ID combinations - Full private-link connectivity verifying all driver connections go exclusively through the NLB proxy - Dynamic route updates via REST API with driver reconnection through new proxy ports
Cache the ParseDesc object constructed in recv_results_rows() so that repeated executions of the same prepared statement skip the list comprehensions, ColDesc construction, and make_deserializers() call. The cache is keyed by id(column_metadata). For prepared statements the result_metadata list is stored on PreparedStatement and reused, so id() is stable. On cache hit we verify object identity (cached_ref is column_metadata) and that session-level settings (column_encryption_policy, protocol_version) still match. Implementation details: - _get_or_build_parse_desc: cached path, used only when column_metadata comes from result_metadata (prepared statements with stable id()). - _build_parse_desc: uncached path, used for inline metadata from non-prepared queries that creates a fresh list every execution. - Cache is bounded to 256 entries; cleared entirely when full. - Returns only (desc, column_names, column_types) to the caller, avoiding exposure of internal cache fields. - Thread-safety: dict get/set are atomic in CPython (GIL), but concurrent cache misses may cause redundant construction (benign). A clear_parse_desc_cache() and get_parse_desc_cache_size() function are exposed for testing. ## Benchmark results (median, pytest-benchmark) ### ParseDesc construction only (reference benchmarks) | Columns | **Before** (original) | **After** (with cache) | |---------|-----------------------|------------------------| | 5 cols | 3,966 ns | 191 ns | | 10 cols | 5,730 ns | 175 ns | | 20 cols | 9,266 ns | 166 ns | | 50 cols | 19,388 ns | 193 ns | ### Full pipeline integration (recv_results_rows through Cython) | Scenario | **Before** (original) | **After** (with cache) | |-------------------|-----------------------|------------------------| | 1 row x 10 col | 40,867 ns | 2,977 ns | | 100 rows x 5 col | 145,584 ns | 73,206 ns | | 1000 rows x 5 col | 1,099,825 ns | 999,517 ns | For small result sets (single-row lookups common with prepared statements), ParseDesc construction is a large fraction of the total response-path cost. Caching eliminates it entirely after the first execution. All 623 unit tests pass (16 skipped - pre-existing).
- Fix copyright header in benchmark file (DataStax -> ScyllaDB) - Add pytest.importorskip guard for pytest-benchmark in benchmark file - Add unit tests for ParseDesc cache under tests/unit/cython: cache hit, miss, protocol version invalidation, clear, bounded eviction, correctness
9f2f3ff to
203080e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ParseDescobject constructed inrecv_results_rows()so repeated executions of the same prepared statement skip list comprehensions,ColDescconstruction, andmake_deserializers()callsid(column_metadata)with identity verification on hit — safe for prepared statements whereresult_metadatais stored onPreparedStatementand reusedclear_parse_desc_cache()function for testingBenchmark results (median, pytest-benchmark)
ParseDesc construction only
Full pipeline (ParseDesc + row parsing)
For single-row lookups (common with prepared statements), ParseDesc construction is a large fraction of the total response-path cost. Caching eliminates it entirely after the first execution.
All 116 unit tests pass (1 skipped — pre-existing test_datetype issue).